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Abstract 

The  representation  of  physical  space  has  traditionally  fo¬ 
cused  on  keyphrases  such  as  “Computer  Science  Building” 
or  “Physics  Department”  that  help  us  in  describing  and 
navigating  physical  spaces.  However,  such  keyphrases  do 
not  capture  many  properties  of  physical  space.  As  with  the 
assignment  of  a  keyword  to  describe  a  piece  of  text,  these 
constructs  sacrifice  meaningful  information  for  abstraction. 
We  propose  a  system  of  spatial  representation  based  on 
richer,  emergent  language  models  that  encode  information 
lost  in  keyphrase  approaches.  We  use  a  mix  of  wearable  and 
ubiquitous  computing  environments  for  the  construction  of 
these  models.  Wearable  computers  infer  language  models 
of  their  hosts.  These  language  models  then  act  as  semantic 
paint  over  spaces  in  a  ubiquitous  computing  environment. 
Spaces  collect  this  information  and  construct  representa¬ 
tions  based  on  interactions  with  augmented  humans.  A  pro¬ 
totype  navigation  system  based  on  this  theory  is  presented 
and  compared  to  traditional  representations. 


1.  Introduction 

Traditionally,  the  semantic  labeling  of  spaces  with  build¬ 
ing  or  room  names  involves  the  manual  task  of  assign¬ 
ing  some  keyphrase  to  a  space.  Unfortunately,  these  as¬ 
signments  do  not  constitute  a  rich  representation  of  space. 
A  computer  science  building  is  more  than  just  “computer 
science”;  it  also  encompasses,  to  varying  degress,  “algo¬ 
rithms,”  “artificial  intelligence,”  “machine  learning,”  and 
many  other  topics  depending  on  the  occupants  of  the  build¬ 
ing.  That  is,  a  person’s  conception  of  a  building  includes 
more  than  its  structure  (e.g.,  floor  plans,  lighting).  Espe¬ 
cially  when  familiar  with  the  objects  and  people  occupying 
a  building,  a  person  might  think  of  that  building  as  some¬ 
thing  more  abstract  and  meaningful  than  a  collection  of 


generic  objects  and  people.  For  example,  the  task  of  la¬ 
beling  a  building  would  be  quite  difficult  if  we  were  only 
given  access  to  its  structural  properties.  Knowledge  of  the 
occupants  provides  insight  when  constructing  a  meaningful 
description.  When  we  are  given  the  names  and  homepages 
of  the  occupants,  we  can  better  assign  a  useful  label  to  a 
building. 

We  describe  the  development  of  a  system  of  spatial  rep¬ 
resentation  grounded  in  the  interaction  of  people  in  space. 
Related  work  in  representation  has  been  conducted  in  infor¬ 
mation  retrieval  and  collaborative  filtering.  In  these  areas, 
good  document  or  item  representations  are  measured  by  an 
ability  to  effectively  rank  a  set  of  items  with  respect  to  a 
query  or  active  user.  Likewise,  the  task  of  finding  relevant 
spaces  can  motivate  the  adoption  of  similar  representations; 
we  want  to  rank  space  with  respect  to  a  description  of  what 
we  are  looking  for. 

We  explore  approaches  to  spatial  representation  that  rely 
upon  occupant-derived  representation.  Such  a  model  re¬ 
quires  both  a  representation  of  the  individual  occupants  as 
well  as  an  algorithm  for  constructing  a  representation  of 
the  space  from  this  information.  Wearable  computers  pro¬ 
vide  an  excellent  platform  for  the  first  task.  Indeed,  tradi¬ 
tional  user  modeling  techniques  deployed  in  a  variety  of  do¬ 
mains  serve  as  a  lower  bound  on  the  performance  of  wear¬ 
able  computers  in  constructing  representations  of  individu¬ 
als.  Already,  wearable  computer  systems  have  been  devel¬ 
oped  which  demonstrate  the  ability  to  construct  fine-grained 
models  of  individuals  [5,  11],  To  address  the  second  task, 
we  adopt  a  computational  partitioning  of  space  similar  to 
Dataspace  [7],  In  this  architecture,  physical  spaces  such  as 
buildings  or  rooms  maintain  computational  resources  pro¬ 
viding  a  location  for  accumulating  knowledge.  Individu¬ 
als  with  wearable  computers  passing  through  these  spaces 
provide  the  personal  information  used  to  build  spatial  mod¬ 
els.  The  problem  of  constructing  a  representation  of  a  space 
reduces  to  reasoning  about  the  collection  of  user  models 
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which  pass  through  a  space. 

This  paper  develops  the  idea  of  interaction-based  spatial 
representations  by  starting  from  traditional  approaches  to 
representation.  A  general  theory  of  interaction-based  repre¬ 
sentation  is  developed  in  Section  2.  In  order  to  contextual¬ 
ize  our  approach  to  previous  methods  of  representation,  we 
have  organized  several  existing  architectures  into  a  set  of 
categories.  Using  this  theory  of  interaction-based  represen¬ 
tations,  we  develop  a  method  for  constructing  interaction- 
based  spatial  representations  in  Section  3.  In  the  course  of 
this  process,  we  describe  representational  techniques  pre¬ 
viously  not  explored.  Having  developed  an  algorithm  for 
building  these  representations,  we  describe  the  implemen¬ 
tation  of  a  prototype  navigation  system  in  Section  4.  We 
conclude  by  placing  our  research  in  context  and  discussing 
future  directions  in  Sections  5  and  6. 

2.  Interaction-based  Representation 

Being  concerned  with  the  representation  of  space,  we  be¬ 
gin  by  discussing  the  various  methodologies  of  construct¬ 
ing  representations.  Our  focus  will  not  be  on  a  personal 
representation  of  space  as  is  explored  in  artificial  intelli¬ 
gence.  Instead,  we  will  prefer  an  approach  which  focuses 
on  the  social  definition  of  a  space.  So,  rather  than  having 
an  agent  ask,  “what  is  this  space  about?”,  we  have  the  space 
ask,  “what  am  I  about?”  In  particular,  we  are  interested 
in  the  representation  of  spaces  as  a  product  of  interaction 
with  people.  However,  we  will  first  develop  a  more  general 
method  for  building  interaction-based  representations  using 
previous  work. 

2.1  Simple  Interaction 

The  first  type  of  representation  to  consider  is  a  simple  de¬ 
scription  of  what  is  interacting  in  a  system  of  objects.  There 
are  many  dimensions  upon  which  interaction  may  occur: 
two  people  speaking  (linguistic  interaction),  several  people 
being  in  eyesight  of  eachother  (visual  interaction),  two  cars 
colliding  (physical  interaction).  When  considering  a  partic¬ 
ular  dimension,  some  objects  are  more  relevant  than  others. 
The  two  people  in  a  room  are  more  relevant  during  a  dia¬ 
log  than,  for  example,  the  chairs  these  individuals  are  sit¬ 
ting  on.  Relevance  is  mentioned  as  a  means  to  reduce  the 
system  which  we  will  have  to  describe.  Even  though  the 
door  may  be  semi-relevant  to  a  dialog,  such  a  system  can 
be  described  as  two  people  speaking.  Therefore,  linguis¬ 
tic  interactions  can  be  described  by  an  interaction  matrix  of 
partners  in  conversations.  That  wearable  computers  provide 
this  type  of  human-level  monitoring  partially  motivates  this 
work  and  explains  why  many  of  the  examples  involve  peo¬ 
ple.  Beyond  this,  however,  ubiquitous  computing  results  in 
a  similar  potential  in  physical  objects.  For  example,  if  cars 


are  augmented  with  collision  sensors,  an  interaction  matrix 
can  describe  the  physical  interaction. 

Much  previous  research  implicitly  adopts  this  interaction 
based  framework.  For  example,  collaborative  filtering  and 
Chalmers’  path-based  information  retrieval  abstract  infor¬ 
mation  objects  (e.g.  documents,  movies,  songs)  and  manip¬ 
ulate  their  representations  with  respect  to  the  people  they 
interact  with  [4,  1],  In  these  systems,  people  are  represented 
by  the  objects  they  have  read,  watched,  or  heard.  Fikewise, 
the  information  objects  are  represented  by  the  people  who 
read,  watch,  or  hear  them.  Both  representations  ignore  de¬ 
scriptions  of  the  components  (i.e.  people  and  items). 

In  terms  of  ubiquity,  Davis,  et  al.  develop  a  represen¬ 
tation  of  nodes  in  an  ad  hoc  wireless  network  with  respect 
to  communicative  interaction  [6],  In  such  an  environment, 
nodes  may  have  very  limited  communication  range  and  high 
mobility.  Globally,  the  resulting  network  can  be  partitioned, 
dynamic,  and  altogether  difficult  to  navigate.  The  goal  is 
to  find  a  relatively  short  and  reliable  route  from  one  node 
to  another  given  only  a  destination’s  identifier.  Interaction 
is  defined  by  two  nodes  being  within  communication  range. 
Here,  a  particular  node  is  represented  by  the  history  of  other 
nodes  with  which  it  has  had  possible  communication.  As 
with  collaborative  filtering,  a  description  of  the  components 
of  this  representation  (i.e.  other  nodes)  is  lacking. 

2.2  Interaction  Described 

A  second  type  of  representation  is  possible  using  the  de¬ 
scriptive  history  of  interactions  an  object  has  participated 
in.  We  believe  that  there  is  power  in  describing  the  inter¬ 
action  itself.  When  people  are  speaking,  one  can  describe 
the  system  not  just  by  who  is  speaking  to  whom  but  also 
by  what  words  are  spoken  between  these  individuals.  When 
two  cars  collide,  one  can  use  a  range  of  values  to  describe 
the  collision.  Consequently,  a  person  can  be  represented 
by  the  words  he  or  she  has  read,  written,  heard,  or  spoken. 
Fikewise,  a  car  can  be  described  by  the  severity  of  collisions 
it  has  been  involved  in. 

Traditional  information  retrieval  may  be  cast  in  this  rep¬ 
resentation  scheme.  The  population  of  objects  consists  of 
the  users  and  their  document  collection.  Interaction  is  de¬ 
fined  by  reading  a  document  and,  hence,  can  be  described 
by  what  is  read.  That  is,  a  document  is  only  represented  by 
the  words  that  flow  between  it  and  a  reader  (i.e.  the  text). 
More  recent  information  retrieval  systems  incorporate  addi¬ 
tional  knowledge  into  representation.  For  example,  hyper¬ 
text  retrieval  adds  inter-document  interaction  to  representa¬ 
tion  [15,9]. 

With  respect  to  collaborative  filtering,  content-based 
schemes  incorporate  linguistic  knowledge  about  the  inter¬ 
actions  beween  objects  and  observers  [13],  So,  in  addition 
to  being  represented  by  the  people  who  have  interacted  with 


it,  a  particular  item  is  also  represented  by  a  description  of 
the  interaction  which  may  be  the  text  of  a  document  or  the 
synopsis  of  a  film. 

Both  of  the  representational  schemes  describe  important 
aspects  of  the  system.  Simple  interaction  tells  one  a  lot. 
But,  while  it  gives  insight  into  the  identities  of  the  peers,  this 
set  alone  provides  nothing  beyond  a  social  context.  Know¬ 
ing  the  ISBN  numbers  of  books  I  read  and  social  security 
number  of  the  people  I  speak  with  tells  one  little  about  my 
interests.  It  may,  as  in  collaborative  filtering,  be  able  to  rep¬ 
resent  interests  in  the  abstract  sense  of  individuals’  overlap¬ 
ping  book  or  dialog-peer  selections.  Nevertheless,  if  given 
the  text  of  all  of  the  books  and  the  linguistic  histories  of  all 
of  my  dialog-peers,  then  one  may  be  able  to  better  describe 
my  interests.  Similarly,  knowing  who  is  passing  through 
space  can  tell  one  a  lot  about  popularity  and  groups  of  peo¬ 
ple.  Knowledge  about  the  set  of  interests  of  that  group  can 
go  further  even  if  we  do  not  know  what  is  of  particular  in¬ 
terest  in  that  space. 

3.  Spatial  Semantic  Model 

The  focus  in  our  examples  on  people  and  language  is  not 
accidental.  First,  people  are  readily  monitored  and  repre¬ 
sented  by  wearable  computers.  One  of  the  advantages  of 
wearable  computers  is  their  persistant  existance  with  an  in¬ 
dividual.  The  interaction  of  these  wearable  computers  and 
other  physically-bound  computers  allows  the  exploration 
of  the  representational  methodologies  we  described  above. 
Second,  language  grounds  representation  in  flexible,  under¬ 
standable  primitives.  For  our  ends,  a  linguistic  represen¬ 
tation  is  useful  since  it  allows  us  to  build  systems  that  are 
queriable  using  traditional  information  retrieval  techniques 
[12,  19].  Words  provide  a  powerful  interface  potential  and 
carry  a  history  of  academic  research.  Given  these  aspects, 
we  would  like  to  build  a  spatial  representation  system  which 
incorporates  both  interaction  as  well  as  linguistic  represen¬ 
tations.  We  will  first  describe  a  framework  for  building  lin¬ 
guistic  representations.  Using  this  grounding  and  the  ideas 
developed  in  Section  2,  we  will  develop  a  method  to  bind 
meaningful  linguistic  representations  to  physical  space. 

3.1  Linguistic  Representation 

Wearable  computers  have  access  to  a  wealth  of  linguistic 
information  in  the  form  of  both  text  (email,  web  browsing, 
and  document  composition)  and  speech  (through  speech 
recoginition  technology).  The  result  is  a  history  of  words 
which  have  passed  over  the  user’s  lips,  ears,  and  eyes.  Com¬ 
plete  histories  are  informative  but  not  compact  and  certainly 
not  immediately  comparable. 

We  propose  the  use  of  information  retrieval  techniques 
for  abstracting  from  collections  of  words.  The  information 


retrieval  community  represents  documents  in  any  number  of 
ways:  keywords,  subject  headings,  abstracts,  term  vectors. 
Recent  advances  in  information  retrieval  have  found  rep¬ 
resentational  power  in  language  models  of  documents  [16]. 
The  intuition  with  language  models  is  that  there  is  an  under¬ 
lying  generative  model  for  some  collection  of  words.  Word 
collections  act  as  a  sample  from  this  model  and  can  be  be 
used  to  estimate  the  “true”,  underlying  language  model.  To 
a  certain  extent,  for  an  individual,  this  representation  can 
serve  to  describe  a  set  of  interests.  A  person  who  is  in¬ 
terested  in  computer  science  is  more  likely  to  speak,  write, 
hear,  or  read  about  “algorithms”  and  “artificial  intelligence” 
than  a  person  who  is  not  interested  in  these  things  at  all. 

3.2  Traditional  Interaction-based  Spatial  Repre¬ 
sentations 

Since  we  are  basing  our  spatial  representation  on  inter¬ 
action,  an  investigation  of  methods  described  in  Section  2 
is  appropriate. 

First,  we  consider  the  a  representation  based  on  simple 
interaction.  The  resultant  representations  would  be  similar 
to  items  in  the  collaborative  filtering  example  or  nodes  in 
the  ad  hoc  routing  example.  A  single  space  would  be  repre¬ 
sented  by  a  vector  of  individuals  who  have  passed  through 
it.  A  collection  of  these  spatial  representations  would  allow 
query  by  example  like  collaborative  filtering  or  the  search 
for  a  paritcular  individual  like  ad  hoc  routing.  However, 
neither  of  these  features  result  in  easy  map-based  interac¬ 
tion. 

We  have  already  described  the  advantages  of  linguistic 
representations  in  Section  3.1.  Practically,  though,  where 
does  the  linguistic  information  about  a  space  come  from? 
Although  it  is  natural  to  think  about  linguistic  interactions 
between  people,  using  language  to  characterize  interaction 
with  space  is  not  obvious.  Let  us  consider  the  options.  If 
it  is  claimed  that  there  are  word-based  representations  of 
spaces,  where  do  the  representations  come  from?  Perhaps  a 
linguistic  interaction  with  a  space  means  a  linguistic  inter¬ 
action  is  occurring  in  that  space.  A  linguistic  history  of  a 
space  would  be  constructed  from  the  history  of  words  spo¬ 
ken  within  that  space.  Basically,  anything  communicated 
between  people  in  a  space  is  monitored  and  incorporated 
into  its  representation.  Figure  1  provides  an  interpretation 
of  the  source  of  linguistic  information.  The  resulting  lin- 
gusitic  representations  are  based  solely  on  the  linguistic  in¬ 
teractions  happening  in  a  space.  The  identities  of  the  par¬ 
ticipants  are  ignored.  A  linguistic  representation  of  space 
built  like  this  is  reasonable  but  not  practical.  The  history  of 
words  spoken  in  a  space  is  potentially  sparse  or  misrepre- 
sentative.  Even  though  nary  a  word  may  be  spoken  in  an 
office,  it  can  still  have  a  representation  based  on  the  people 
occupying  it. 


Room  Representation 


Room  Representation 


Room  Representation 


Figure  2.  Linguistic  information  from  linguis¬ 
tic  representations  of  the  individuals  in  a 
space:  The  linguistic  representation  of  a 
room  is  constructed  from  an  abstract  repre¬ 
sentation  of  occupants  in  a  room.  This  ab¬ 
stract  representation  describes  a  user’s  set 
of  interests. 


Figure  1.  Linguistic  information  from  linguis¬ 
tic  interaction  occuring  in  a  space:  The 
linguistic  representation  of  a  room  is  con¬ 
structed  solely  from  the  words  spoken  in  the 
room. 


3.3  User-based  Interaction-based  Representa¬ 
tions 

The  short-comings  of  traditional  approaches  to 
interaction-based  representation  lead  us  to  consider 
novel  methods  of  constructing  such  representations.  One 
of  the  disadvantages  described  above  was  the  potential 
sparsity  in  immediate  linguistic  information.  In  order  to  ac¬ 
cumulate  more  data  for  our  models,  then,  we  adovocate  the 
construction  of  models  based  upon  the  linguistic  represen¬ 
tations  of  the  people  using  that  space.  Wearable  computers 
provide  the  ability  to  not  only  monitor  speech  but  also  a 
persistant  monitoring  of  a  user.  It  is  this  monitoring  which 
allows  the  construction  of  rich  models  of  user  linguistic 
patterns.  The  information  about  the  users  occupying  in 
the  space  is  then  exploited  to  construct  a  representation 
of  the  room  itself.  Figure  2  depicts  the  transmission  of 
an  entire  linguistic  representation  to  the  space.  This  user 
representation  includes  the  terms  mentioned  in  the  example 
in  Figure  1  as  well  as  an  individual  context  for  those  terms. 

A  subtle  distinction  between  this  approach  and  previous 
approaches  should  be  realized.  Whereas  the  two  interacive 
representations  described  place  objects  in  either  an  immedi¬ 
ate  social  or  information  context,  our  spatial  representation 
attempts  to  combine  the  two.  The  linguistic  representations 
are  constructed  not  from  immediate  interaction  such  as  text 
in  a  document  but  from  the  linguistic  representations  of  the 
users.  This  would  be  akin  to  describing  a  document  by  the 


During  each  timestep, 

1.  wearable  machines  recompute  linguistic  repre¬ 
sentations  of  their  hosts 

2.  if  a  user  enters  a  space, 

(a)  the  new  user’s  wearable  machine  transmits 
its  linguistic  representation  to  computer  as¬ 
sociated  with  that  space 

(b)  the  spatially-bound  machine  recomputes  a 
representation  of  itself  based  on  the  new 
collection  of  linguistic  user  representations 


Figure  3.  Representation  construction  algo¬ 
rithm 


confluence  of  linguistic  representations  of  its  readers.  This 
is  an  alternative  representation  we  assume  is  a  close  approx¬ 
imation  of  immediate  interaction.  Even  in  cases  where  in¬ 
formation  about  immediate  interaction  is  provided,  a  rea¬ 
soning  about  the  linguistic  representations  of  users  is  po¬ 
tentially  exploitable.  For  example,  if  two  statistics  texts  are 
equivalent  with  respect  to  content,  knowing  that  one  is  far 
more  often  used  by  computer  scientists  perhaps  tells  us  that 
this  text  is  better  suited  for  a  computer  science  curriculum 
than  the  other. 

A  space,  then,  accumulates  knowledge  about  its  occu¬ 
pants  in  the  form  of  these  models  from  which  composite 
representations  can  be  constructed.  Components  of  this  dis¬ 
tribution  will  be  reinforced  if  people  have  common  inter¬ 
ests.  So,  even  though  several  people  in  a  particular  space 
may  be  quite  different,  the  composite  representation  should 


For  each  space  to  be  considered, 

1.  calculate  the  relevance  of  the  space  to  the  query 

2.  highlight  the  spaces  according  to  this  relevance 
measure. 


Figure  4.  Representation  retrieval  algorithm 

encompass  the  similarities  between  those  representations. 
In  order  to  accomplish  this  with  a  collection  of  language 
models,  we  perform  a  uniform  combination  of  between  of 
people  who  have  passed  through  a  particular  space.  Figure  3 
describes  the  behavior  of  the  system  construction  algorithm 
during  execution. 

The  querying  of  a  collection  of  spatial  representations 
can  be  thought  of  as  analogous  to  reading  a  map  for  rele¬ 
vant  areas.  In  this  case,  reading  is  substituted  by  natural 
language  querying  similar  to  modern  information  retrieval 
systems.  Because  our  spatial  representations  are  proba¬ 
bilistic  models,  we  can  compute  the  relevance  of  a  space 
as  the  probability  of  that  spatial  representation  generating 
the  query.  The  relevance  is  related,  then,  to  the  likelihood 
of  those  words  having  been  spoken  by  the  occupants  who 
passed  through  that  space.  Using  our  map  metaphor.  Figure 
4  describes  the  retrieval  algorithm. 

3.4  Scenario:  Alice’s  Day  Out 

Consider  the  following  scenario.  Alice,  an  undergradu¬ 
ate  computer  science  student,  owns,  like  everyone  else,  a 
wearable  computer  which  infers  a  linguistic  representation 
from  her  web  browsing  and  document  composing  habits. 
Because  Alice  is  interested  in  areas  such  as  wearable  com¬ 
puting  and  artificial  intelligence,  the  language  model  allots 
a  larger  probability  mass  to  words  such  as  “wearable,”  “mo¬ 
bile,”  “learning,”  and  additional,  related  words.  However, 
Alice  is  not  one  dimensional  so  her  language  model  as¬ 
signs  relatively  high  probability  to  terms  such  as  “guitar,” 
“tremelo,”  and  “flamenco.”  Clearly  Alice  also  has  some  in¬ 
terest  in  classical  guitar. 

Since  Alice’s  environment  is  augmented  with  spatial  rep¬ 
resentation  machines,  rooms  in  her  department’s  building 
have  representations  associated  with  the  language  models  of 
common  foot  traffic.  So,  as  Alice  enters  her  laboratory,  her 
wearable  communicates  the  inferred  language  model  to  the 
local  spatial  representation  machine.  The  spatial  represen¬ 
tation  machine  then  recalculates  its  representation  based  on 
this  new  information  as  well  as  the  history  of  language  mod¬ 
els  it  has  been  transmitted  by  others.  Because  the  major¬ 
ity  of  peers  in  Alice’s  laboratory  also  study  machine  learn¬ 
ing,  the  combined  language  model  reinforces  terms  such  as 


“learning,”  “training,”  and  so  on. 

This  afternoon,  Alice  visits  a  university  campus  she  is 
considering  for  graduate  school.  Unfamiliar  with  the  cam¬ 
pus,  Alice  asks  her  wearable  how  to  get  to  the  machine 
learning  laboratory.  The  wearable  contacts  the  campus  di¬ 
rectory  which  maintains  communication  with  all  of  the  spa¬ 
tial  representation  machines  on  campus.  This  central  direc¬ 
tory  then  estimates  the  probability  of  Alice’s  query  being 
satisfied  by  the  different  spaces.  The  campus  directory  con¬ 
structs  a  campus  map  overlaid  with  color  whose  intensity  is 
relative  to  this  probability.  The  map  is  transmitted  to  Alice’s 
wearable.  Alice  notices  that  there  is  a  cluster  of  bright  red 
spaces  in  the  building  next  to  her.  A  visit  to  the  brightest 
spaces  results  in  Alice  finding  the  machine  learning  labo¬ 
ratory.  Investigating  other  brightly  colored  spaces  in  the 
building,  Alice  discovers  that  the  robotics  laboratory  also 
conducts  interesting  work  in  machine  learning. 

Satisfied  with  the  machine  learning  research  on  campus, 
Alice  asks  about  guitar  playing  on  campus.  Disappointed  by 
the  initial  results  (almost  every  building  has  a  guitar  player), 
Alice  specifies  classical  guitar  playing.  A  small  cluster  of 
rooms  gets  highlighted  in  a  nearby  building.  Here,  Alice 
finds  a  music  department  where,  apparently,  flamenco  is 
embraced. 


4.  Prototype  Interface 


This  spatial  representation  system  is  being  deployed  for 
the  Computer  Science  Building  at  the  University  of  Mas¬ 
sachusetts.  While  a  department-wide  adoption  of  wearable 
computers  is  welcome,  it  is  not  feasible  at  the  moment.  Lin¬ 
guistic  data  for  occupants  of  the  building  has  been  synthe¬ 
sized  from  publications  and  home  pages.  This  data  will 
serve  to  construct  language  models  of  the  occupants  of  the 
building.  However,  before  constructing  the  models,  some 
preprocessing  was  conducted.  First,  text  extracted  from 
these  documents  was  normalized  by  stemming  according  to 
the  KStem  algorithm  and  dropping  a  list  of  high-frequency, 
content-free  stop  words  [10].  Terms  occurring  only  once  in 
the  entire  collection  were  omitted  from  calculations.  For 
each  individual  in  the  department,  his  or  her  documents 
were  used  to  build  a  vector  of  term-frequency  pairs.  Simple 
language  models  are  built  using  the  maximum  likelihood 
estimate. 


Po(w) 


tfoiw) 


where  tf0(w)  is  the  count  in  the  term  vector  for  building 
occupant  o.  This  gives  us  a  naive  language  model.  Unfortu¬ 
nately,  such  an  estimate  assigns  zero  probability  to  unseen 
words.  This  problem  is  addressed  by  by  mixing  the  max¬ 
imum  likelihood  language  model  with  a  model  of  general 
English.  In  this  case,  a  general  English  model  is  constructed 


from  the  entire  collection  of  documents  for  all  users.  There¬ 
fore, 

P!M)  =  XPo(wi)  +  {1  -  X)PGE{wi), 

where  A  is  a  mixing  parameter  which  we  set  to  .80. 

A  model  for  the  second  and  third  floors  of  the  Computer 
Science  Building  was  then  constructed  to  allow  simulation 
of  occupancy.  The  individuals  were  associated  with  their 
offices  in  the  model.  Many  offices  are  shared,  demand¬ 
ing  a  combination  of  the  language  models  of  the  occupants. 
Composite  language  models  were  built  by  uniformly  com¬ 
bining  the  individual  language  models, 

P{Wi\Ms)  =  P’{Wi) 

'  s' jeo. 

where  Os  is  the  set  of  occupants  in  the  space  s  for  which 
Ms  is  a  model. 

This  semantic  model  of  the  building  is  constructed  lo¬ 
cally  on  a  Xybernaut  wearable  computer  [21],  Due  to  the 
relatively  small  number  of  potentially  relevant  spaces,  no 
optimization  of  the  indexing  needed  to  be  conducted.  This 
would  be  necessary  in  very  large  buildings  or  sets  of  spaces. 
We  designed  the  system  for  speech  to  allow  the  flexibility  of 
traditional  information  retrieval  querying  without  the  over¬ 
head  of  learning  to  use  traditional  wearable  keyboard  alter¬ 
natives.  A  user  interacts  with  the  system  by  issuing  speech 
queries  recognized  by  IBM  ViaVoice  runtime  libraries  [20]. 
The  set  of  recognized  words  constitutes  the  query.  The  sys¬ 
tem  then  generates  a  relevance  measure  for  all  the  spaces 
in  the  building  based  upon  the  probability  of  the  space’s  se¬ 
mantic  model,  Ms,  generating  those  words: 

P(q\Ms)  =  n  P(qi\Ms), 

i 

where  q  is  the  sequence  of  query  terms.  These  probabilities 
are  then  used  to  mark  up  a  map  of  the  building  that  is  pre¬ 
sented  to  the  user  on  a  touch  panel  display.  Figure  5  shows 
this  map  for  the  query  “robotics.”  The  interface  presents 
the  user  with  the  current  state  of  the  query,  which  provides 
context  for  the  results.  These  results  are  displayed  in  two 
panels.  The  left-hand  panel  shows  the  relevance  of  spaces 
in  the  building.  The  right-hand  panel  displays  the  ranked 
list  of  relevant  spaces  using  manually  assigned  labels.  We 
found  that  this  list  helps  in  rapidly  characterizing  the  space 
especially  when  used  in  conjunction  with  the  highlighted 
map.  In  our  example,  the  system  highlights  the  robotics 
laboratory  and  offices  of  associated  people.  Interestingly, 
the  system  also  detects  the  interest  of  machine  learning  and 
artificial  intelligence  laboratories  in  robotics. 


5.  User  Experience 

Having  built  a  prototype  system,  we  were  interested  in 
the  application  of  this  visualization  to  the  task  of  naviga¬ 
tion  of  space.  Several  computer  science  students  with  vary¬ 
ing  amounts  of  experience  in  the  Computer  Science  build¬ 
ing  were  given  the  system  to  use  for  exploring  the  space. 
Most  users  were  enthusiastic  about  the  system  as  a  means 
of  reducing  the  overhead  when  investigating  a  new  build¬ 
ing.  Traditionally  when  trying  to  determine  where  relevant 
research  is  being  conducted  in  a  building,  a  coordination 
of  web-browsing  and  physical  maps  is  necessary.  Our  sys¬ 
tem  combines  this  information  into  a  single  interface  to  al¬ 
low  more  efficient  navigation.  Users  were  able  to  quickly 
find  the  offices  and  laboratories  relevant  to  particular  in¬ 
terests.  Some  also  gained  an  awareness  of  previously  un¬ 
known  similarities  between  laboratories.  Most  participants 
were  largely  disappointed  with  speech  recognition  perfor¬ 
mance  which  resulted  in  longer  search  times.  One  user  rec¬ 
ommended  the  option  for  query  reformulation  so  that  the 
map  state  would  change  as  terms  were  added  to  a  query. 

6.  Related  Work 

With  respect  to  representation,  our  approach  is  quite 
similar  to  stigmergetic  or  pheromone-based  algorithms  [3]. 
These  systems  harness  the  distributed,  socially-constructed 
representation  of  traffic  on  a  network  for  problems  such 
as  finding  shortest  paths.  Important  to  these  algorithms  is 
the  notion  of  agent  leaving  markers  at  geographic  locations 
and  having  representations  emerge  as  a  result  of  the  marker 
accumulation.  Our  work  in  spatial  representations  demon¬ 
strates  the  application  of  this  theory  to  domains  outside  of 
networking. 

As  an  architecture,  the  Dataspace  model  comes  closest  to 
the  system  we  describe  [7].  While  Dataspace  describes  at  a 
high  level  how  to  partition  and  query  spaces,  the  authors  do 
not  describe  how  the  information  in  such  a  system  comes  to 
reside  where  it  does.  We  consider  our  system  to  be  an  ini¬ 
tial  attempt  at  exploiting  such  an  architecture  in  information 
retrieval. 

Brown’s  work  with  stick-e  notes  is  also  related  in  the  as¬ 
cription  of  data  to  spaces  [2],  Stick-e  notes  are  text  data 
stored  in  spaces.  This  text  is  broadcast  to  a  computer  user  if 
certain  contextual  information  is  satisfied.  Individuals  may 
then  leave  similar  notes  for  others  traveling  through  such  an 
augmented  space.  It  is  this  latter  part  which  we  are  automat¬ 
ing  so  that  instead  of  transmitting  a  text  message,  a  user 
transmits  a  complex  representation.  Coincidentally,  it  is  not 
impossible  for  a  spatial  machine  in  our  system  to  broad¬ 
cast  its  own  representation  to  users  passing  through.  This 
message  encodes  not  only  a  spatial  representation  but  also  a 
potential  user  context.  For  example,  many  researchers  have 
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Figure  5.  Spatial  search  results  for  the  query  “robotics”:  The  left-hand  panel  displays  the  relevant 
spaces  graphically.  The  right-hand  panel  displays  a  ranking  of  relevant  spaces  using  manually 
assigned  labels. 


described  information  retrieval  systems  which  incorporate 
contextual  information  such  as  location  or  room  occupants 
[8,  17].  This  information  retrieval  system  could  seamlessly 
consider  spatial  context  in  the  form  of  the  models  that  we 
present. 


Several  recent  artificial  intelligence  approaches  to  rep¬ 
resentation  attempt  to  learn  meaning  based  on  the  co¬ 
occurrence  of  spoken  words  and  physical  objects  [14,  18]. 
These  techniques  reinforce  specific  word-sensor  associa¬ 
tions  in  an  attempt  to  learn  word  meaning.  The  negotiated 
representation  resides  in  the  heads  of  the  individual  agents 
operating  within  the  environment.  In  other  words,  the  arti¬ 
ficial  intelligence  community  is  interested  in  a  vertical  ap¬ 
proach  to  intelligence  by  focusing  on  the  construction  of  a 
highly  sophisticated  agent  or  group  of  agents  acting  in  dy¬ 
namic  physical  environments.  In  some  ways,  our  work  is  an 
inversion  of  these  artificial  intelligence  initiatives.  The  sys¬ 
tem  attempts  to  learn  object  meaning  by  placing  the  repre¬ 
sentation  into  the  object  itself.  Hence,  we  are  interested  in  a 
horizontal  approach  to  intelligence  by  focusing  on  the  con¬ 
struction  of  sophisticated  dynamic  physical  environments. 
Agents  hold  no  privileged  place. 


7.  Conclusion 

We  have  presented  a  system  to  construct  rich,  emergent 
spatial  representations.  The  representations  result  in  mean¬ 
ingful  spaces  and  aid  in  visualization  and  navigation.  In 
designing  the  representational  system,  a  novel  method  for 
approximating  immediate  linguistic  representation  was  de¬ 
veloped. 

There  are  several  extensions  to  the  system  we  are  cur¬ 
rently  considering.  The  temporal  and  dynamic  aspects  of 
these  emergent  representations  remain  unexplored.  Realis¬ 
tic  movement  models  would  be  necessary  for  these  exper¬ 
iments.  We  are  investigating  the  acquisition  of  empirical 
movement  data  for  the  faculty  and  students  in  our  system. 
By  incorporating  movement  into  our  model  of  the  build¬ 
ing,  spatial  representations  can  be  constructed  using  differ¬ 
ent  transformations  on  the  interaction  histories.  For  exam¬ 
ple,  considering  only  the  a  short,  recent  history  of  people 
occupying  a  space  may  reduce  the  accuracy  of  the  repre¬ 
sentation  but  will  make  the  representation  more  robust  to 
the  dynamism  of  shared  spaces. 

While  the  prototype  system  holds  promise,  limitations 
exist.  The  type  of  queries  possible  is  limited  by  the  amount 
of  representational  power  in  text  information  related  to  a 
space.  For  example,  it  is  unlikely  that  the  system  would 
work  well  on  queries  for  subway  stations,  restaurants,  or 


other  public  places.  The  people  occupying  these  places  are 
too  diverse.  Inferring  meaningful  representations  for  these 
spaces  from  individuals’  language  data  may  not  be  possible 
but  we  believe  useful  linguistic  histories  exist  somewhere  in 
the  environment. 

Wearable  computers  provide  the  ability  to  model  a  vast 
amount  of  user  interaction  beyond  words.  Several  initia¬ 
tives  to  model  context  reveal  the  ability  to  model  abstract 
states  such  as  “walking”  or  “sitting”  [5,  11].  One  can  imag¬ 
ine  other  abstract  states  such  as  such  as  “hammering”.  If 
such  states  were  communicated  to  objects  in  the  environ¬ 
ment,  then  we  could  also  imagine  representing  objects  by 
the  ways  they  have  been  used.  For  example,  a  hammer 
would  most  often  be  used  for  hammering  though  a  shoe  may 
also  used  for  the  same  task.  An  agent  confronted  with  the 
need  to  hammer  would  not  have  to  reason  about  the  ham¬ 
mering  properties  of  objects  in  the  environment.  Instead, 
it  may  merely  seek  those  objects  whose  representations  in¬ 
clude  hammering. 
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